A latent shared-component generative model for real-time disease surveillance using Twitter data

نویسندگان

  • Roberto C. S. N. P. Souza
  • Denise E. F. de Brito
  • Renato Assunção
  • Wagner Meira
چکیده

Exploiting the large amount of available data for addressing relevant social problems has been one of the key challenges in data mining. Such efforts have been recently named“data science for social good” and attracted the attention of several researchers and institutions. We give a contribution in this objective in this paper considering a difficult public health problem, the timely monitoring of dengue epidemics in small geographical areas. We develop a generative simple yet effective model to connect the fluctuations of disease cases and disease-related Twitter posts. We considered a hidden Markov process driving both, the fluctuations in dengue reported cases and the tweets issued in each region. We add a stable but random source of tweets to represent the posts when no disease cases are recorded. The model is learned through a Markov chain Monte Carlo algorithm that produces the posterior distribution of the relevant parameters. Using data from a significant number of large Brazilian towns, we demonstrate empirically that our model is able to predict well the next weeks of the disease counts using the tweets and disease cases jointly.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Exploring the spatial patterns of three prevalent cancer latent risk factors in Iran; Using a shared component model

Background and aims: The aim of this study was the modeling of the incidence rates of Colorectal, breast and prostate cancers using a shared component model in order to explore the spatial pattern of their shared risk factors (i.e., obesity and low physical activity) affecting on cancer incidence, and also to estimate the relative weight of these shared components. Methods: In this study,...

متن کامل

Design and Test of the Real-time Text mining dashboard for Twitter

One of today's major research trends in the field of information systems is the discovery of implicit knowledge hidden in dataset that is currently being produced at high speed, large volumes and with a wide variety of formats. Data with such features is called big data. Extracting, processing, and visualizing the huge amount of data, today has become one of the concerns of data science scholar...

متن کامل

Syndromic Surveillance using Generic Medical Entities on Twitter

Public health surveillance is challenging due to difficulties accessing medical data in real-time. We present a novel, effective and computationally inexpensive method for syndromic surveillance using Twitter data. The proposed method uses a regression model on a database previously built using named entity recognition to identify mentions of symptoms, disorders and pharmacological substances o...

متن کامل

Forecasting Word Model: Twitter-based Influenza Surveillance and Prediction

Because of the increasing popularity of social media, much information has been shared on the internet, enabling social media users to understand various real world events. Particularly, social media-based infectious disease surveillance has attracted increasing attention. In this work, we specifically examine influenza: a common topic of communication on social media. The fundamental theory of...

متن کامل

A two-component model for counts of infectious diseases.

We propose a stochastic model for the analysis of time series of disease counts as collected in typical surveillance systems on notifiable infectious diseases. The model is based on a Poisson or negative binomial observation model with two components: a parameter-driven component relates the disease incidence to latent parameters describing endemic seasonal patterns, which are typical for infec...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1510.05981  شماره 

صفحات  -

تاریخ انتشار 2015